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Abstract. Lattice fermions with suppressed high momentum modes solve 
the ultraviolet slowing down problem in lattice QCD. This paper describes a 
stochastic evaluation of the effective action of such fermions. The method is a 
based on the Lanczos algorithm and it is shown to have the same complexity 
as in the case of standard fermions. 
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1. Introduction 

There has been recent interest in the so-called 'Ultraviolet slowing down' of 
fcrmionic simulations in lattice QCD [Irving et al, 1998, Duncan, Eichten and Thacker, 1999, 
de Forcrand, 1999, Pcardon, Lattice2001, A. Hasenfratz & Knechtli, 2001]. These 
studies try to address algortmically large fluctuations of the high end modes of the 
fermion determinant. The goal is to increase the signal-to- noise ratio of the infrared 
modes and to accelerate fermion simulations as well. 

In fact, all the computational effort needed to treat UV-modes by above algo- 
rithms can be reduced to zero by suppressing them in the first place [Borigi, 2002]. 
The lattice Dirac operator of this fermion theory is given by: 

(1.1) £=^r 5 tanh ar5Ay/g/ ° 

a fi 

where D w / S / D is the input lattice Dirac operator, a the lattice spacing and [i > is 
a dimcnsionlcss parameter. For Wilson (W) and overlap (o) fermions as the input 
theory one has Fs = 75 . For staggered fermions is a diagonal matrix with entries 
+1/ — 1 on even/odd lattice sites. The theory converges to the input theory in the 
contimuum limit and is local and unitary as shown in detail in [Borigi, 2002]. The 
input theory is also recovered in the limit /i — > 00. For jj, — > one has D — > /j,, i.e. 
a quenched theory. 

Perturbative calculations with this theory are straightforward. To fix the idea I 
assume in the following Wilson fermions to be input fermions. The inverse fermion 
propagator is given by 

(1.2) ^ W = ^ 75 tanh^M 

with p = {p v ,v = 1,... ,4} being the four-momentum vector. As usual, gauge 
fields are parametrized by sw(3) elements: 

(1.3) U(x) v = e iasA(x ^, A{x) v e su(3) 

and the Wilson operator is written as a sum of the free and interaction terms: 

D W = D° W + D{y 

The splitting of the lattice Dirac operator is written in the same form: 



w 



D = D a + D\ Z?° = i- 7 5tanh 

a [i 

where the interaction term has to be determined. This can be done by expanding 
D in terms of a/fi: 

(1.4) D = D W [t+c 1 (^r+c 2 ( a J^r+...] 

where c\, C2, ■ ■ ■ are real expansion coefficients. Calculation of D 1 is an easy task 
if one stays with a finite number of terms in the right hand side of (1.4). Also, 
the number of terms can be minimized using a Chebyshev approximation for the 
hyperbolic tangent. 1 
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In this paper I describe computational methods needed to evaluate the effec- 
tive action of the theory defined above. In particular, the complexity of the pro- 
posed Lanczos method does not depend on the input sparse matrix that describes 
a fermion theory on the lattice. 

In the following section I derive a class of Lanczos based methods for computa- 
tions with the proposed theory and then in section 3 conclusions follow. 

2. Lanczos based methods for computations with fermions 
The effective action of the theory defined above can be written as: 



A is assumed to be Hermitian and positive definite. Since the trace is difficult to 
obtain one can use the stochastic method of [Bai et al, 1996]. The method is based 
on evaluations of many bilinear forms of the type: 



where b € R is a random vector. The trace is estimated as an average over many 
bilinear forms. A confidence interval can be computed as described in detail in 
[Bai et al, 1996]. 

The method described here is similar to the method of [Bai et al, 1996]. Its 
viability for lattice QCD computations has been demonstrated in the recent work of 
[Cahill et al, 1999]. [Bai et al, 1996] derive their method using quadrature rules and 
Lanczos polynomials. Here, I give an alternative derivation which uses familiar tools 
(in lattice simulations) such as sparse matrix invertions and Pade approximations. 
The Lanczos method enters the derivation as an algorithm for solving linear systems 
of the form: 



2.1. Lanczos algorithm. I follow standard standard texts as [Golub & Van Loan, 1989] 
and notations and arguments of [Borigi, 1999b, Borigi, 2000a, Borigi, 2000b]. n 
steps of the Lanczos algorithm [Lanczos, 1952] on the pair (A, b) are given by Al- 
gorithm 1. 



Algorithm 1 The Lanczos algorithm 

Set/3 = 0, q = o, qi =b/\\b\\ 2 
for i = 1, . . .n do 

V = Aqi 



v := v - q l a i - q i ^ 1 (3 i - 1 
AHMI2 
Qi+l = v/fa 
end for 



The Lanczos vectors qi, ■ ■ ■ ,q n € C can be compactly denoted by the matrix 
Qn = [qi, ■ ■ ■ , q n ] ■ They are a basis of the Krylov subspace JC n — span{6, Ab, ... , A n ~ 1 b} . 
It can be shown that the following identity holds: 




(2.2) 



T(b,A) = b T f{A)b 



(2.3) 



Ax = b, x e C 




(2.4) 



AQ n = Q n T n + n q n+1 eZ, qi = b/\\b\\ 2 
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e„ is the last column of the identity matrix 1„ eP x " and T n is the tridiagonal 
and symmetric Lanczos matrix (2.5) given by: 



(2.5) T n 



(ax Pi 

Pi Oi 2 
\ Pn-1 OL n ) 



The matrix (2.5) is usually referred to as the Lanczos matrix. Its eigenvalues, the 
so called Ritz values, tend to approximate the extreme eigenvalues of the original 
matrix A. 

To solve the linear system (2.3) I seek an approximate solution x n e JC n as a 
linear combination of the Lanczos vectors: 

(2.6) x n = Q nVni y n e C" 

and project the linear system (2.3) on to the Krylov subspace JC n : 

QtAQ n y n = Qlb = Ql qi \\b\\2 
Using (2.4) and the orthonormality of Lanczos vectors, I obtain: 

TnVn = ei||6||2 

where e\ is the first column of the identity matrix l n . By substituting y n into (2.6) 
one obtains the approximate solution: 

(2.7) x n = Q n T- 1 e 1 \\b\\ 2 

2.2. Algorithms for the bilinear form (2.2). The theoretical framework of the 
algorithm of [Bai et al, 1996] can be based on the Pade approximant of the smooth 
and bounded function /(.) in an interval. The Pade approximation can be expressed 
as a partial fraction expansion. Therefore, one can write: 



(2.8) f(s) « £ 



c k 



k=i s + dk 

with Cfe € K, dk > 0, k = 1, . . . , m. It is assumed that the right hand side converges 
to the left hand side when the number of partial fractions becomes large enough. 
For the bilinear form I obtain: 



(2.9) T(b,A)^Y. h 



T c k 



fe=i A + 41 

A first algorithm can already be written down at this point. Having com- 
puted the partial fraction coefficients one can use a multi-shift iterative solver of 
[Freund, 1993] to evaluate the right hand side (2.9). To see how this works, I solve 
the shifted linear system: 

(A + d k l)x k = b 

using the same Krylov subspace JC n . A closer inspection of the Lanczos algorithm, 
Algorithm 1 suggests that in the presence of the shift dk I get: 

ot ^a, + d k 
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while the rest of the algorithm remains the same. This is the so called shift- 
invariance of the Lanczos algorithm. From this property and by repeating the 
same arguments which led to (2.7) I get: 

(2-10) x k n = Qn T J. , ei||6|| 2 

Using the shift-invariance of the Lanczos algorithm I obtain Algorithm 2. 

Algorithm 2 The Lanczos algorithm for solving (A + d k ^)x k = b. 

Set 0o = 0, p\ = I/H&H2, Qo =0, q 1 = p\b,x$ = o, x% = o, = 
for i = 1, . . . do 

V = Aq { 
on = qjv 

v := v - qiOLi - qi-iPi-i 
Pi = \\v\\2 

qi+i = v/Pi 

for k = 1, . . . , m do 

x k i+l = -{x k a t + 

Pi+i = ~(P k i a * + ($-i0i-i)/0i 
r k l+1 = q i+ i/p k +i 

end for 

if l/|pj +1 | <ethen 
n = i 
stop 
end if 
end for 



Note that the residual errors r k ,i = 1, . . . ,n,k = 1, . . . ,m are given by: 

r k = b- Ax k i - d k x k 
In exact arithmetic their norm is given by: 

(2.11) l/p k = \\b-Ax k -d k x k \\ 2 

By applying Algorithm 2 one can solve the shifted linear systems on the right 
hand side of (2.9). The algorithm stops if the linear system with the smallest shift 
is solved to the desired accuracy e. This is a well-known technique [Freund, 1993] 
which is used also in lattice QCD [Frommcr et al, 1995]. However, the problem with 
this method is that one needs to store a large number of vectors that is proportional 
to m. This could be prohibitive if m is say larger than 10. 

In fact, the right hand side of (2.9) can be written in terms of solutions x k , k = 
1, . . . , m as a sum of scalars: 

m 

(2.12) F{b, A) w c k w k , w k = b T x k 

k=l 

Therefore, it is easy to replace the vector recurrences by scalar recurrences of the 
form: 

(2.13) w k +1 = -{w k a t + & 
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In this way one obtains the Algorithm 3. It is clear that by applying Algorithm 
3 one gains substantial storage savings compared to Algorithm 2. If one has a good 
Pade approximant for the function /(.) one can apply Algorithm 3. Note that one 



Algorithm 3 The Lanczos algorithm for computing w k , k = 1, . . . , m. 

Set P = 0, p\ = I/H&H2, Qo =0, qi = p{b,w$ = o, = o, = 
for i = 1, . . . do 

v = Aqi 
o-i = qjv 

V :=V - qidi - qi-ifii-i 

A = IH|2 
Qi+i = V/& 
for k = 1, . . . , m do 
= -(fife + 

= -(P?«i + rf-i0i-i)/Pi 
end for 

if <ethen 

n = 2 

stop 
end if 
end for 



another way to save storage is using the multi-shift Gonjugate Gradient variant of 
[B.Jegerlehner, 1998]. 

2.3. An exact method. If a Pade approximation is not sufficient or difficult to 
obtain, the Lanczos method is the only alternative to evaluate exactly the bilinear 
forms of type (2.2). 

To see how this is realized I assume that the linear system (2.3) is solved to 
the desired accuracy using the Lanczos algorithm, Algorithm 1 and (2.7). In the 
application considered here one can show that: 



For the result (2.14) to hold, it is sufficient to show that: 
>? Ck 7 6=i]611 2 ef - °\ - ei 



A + d k t 11 11 1 T n + d k t n 

which can be shown using the orthonormality property of the Lanczos vectors and 
(2.10). Note however that in presence of roundoff errors the orthogonality of the 
Lanczos vectors is lost but the result (2.14) is still valid. The interested reader may 
consult the work of [Cahill et al, 1999, Golub & Strakos, 1994]. 

From this result and the convergence of the partial fractions to the matrix func- 
tion /(.), it is clear that: 

(2.15) T{b, A) w T n {b, A) = 1 16| \ 2 e\ f(T n ) ei 
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Note that the evaluation of the right hand side is a much easier task than the 
evaluation of the right hand side of (2.2). A straightforward method is the spectral 
decomposition of the symmetric and tridiagonal matrix T„: 

(2.16) r n = z n n n z? 

where Q n € M. nxn is a diagonal matrix of eigenvalues u>i, . . . ,lu u of T n and Z n G 
jjnxn j g corr sponding matrix of eigenvectors, i.e. Z n = [z\, ... , z n ]. From (2.15) 
and (2.16) it is easy to show that (see for example [Golub & Van Loan, 1989]): 

(2.17) T n (b,A) = \ \b\\ 2 e T l Z n f(n n )Zle 1 

where the function /(.) is now evaluated at individual eigenvalues of the tridiagonal 
matrix T n . 

The eigenvalues and eigenvectors of a symmetric and tridiagonal matrix can be 
computed by the QL method with implicit shifts [Press et al, 1993]. The method 
has an 0(n 3 ) complexity. Fortunately, one can compute (2.17) with only an 0(n 2 ) 
complexity. Closer inspection of eq. (2.17) shows that besides the eigenvalues, only 
the first elements of the eigenvectors are needed: 

(2.18) A(M)HHI 2 E4/M 

1=1 

It is easy to see that the QL method delivers the eigenvalues and first elements of 
the eigenvectors with 0(n 2 ) complexity. 2 

A similar formula (2.18) is suggested by [Bai et al, 1996]) based on quadrature 
rules and Lanczos polynomials. The Algorithm 4 is thus another way to compute 
the bilinear forms of the type (2.2). 



Algorithm 4 The Lanczos algorithm for computing (2.2). 

Set (3 = 0, pi = I/H&H2, qo =0, qi = pib 
for i = 1, . . . do 

V = Aq { 
a t = q\v 

v : = v - q i0li - q i _ 1 (3 i _ 1 
AHMI2 

qi+i = v/0i 

Pi+i = -{pi&i + pi-i(3i-i) / & 

if l/|p,+i| < e then 

n = i 

stop 
end if 
end for 

Set (T n ) M = ai, (T„) 4+M = (T„) M+1 = 0i, otherwise (T n ) itj = 
Compute u>i and Zu by the QL method 
Evaluate (2.2) using (2.18) 



Clearly, the Lanczos algorithm and Algorithm 3 has an 0(nN) complexity, 
whereas Algorithm 4 has a greater complexity: 0(nN) + 0(n 2 ). However, Al- 
gorithm 4 delivers an exact evaluation of (2.2). For typical applications in lattice 



2 I thank Alan Irving for the related comment on the QL implementation in [Press et al, 1993]. 
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Figure 1. Normalized recursive residual (solid line) and relative 
differences of (2.18) (dotted line) produced by Algorithm 4. 

QCD the 0(n/N) overhead is small and therefore Algorithm 4 is the recommended 
algorithm among all three algorithms presented in this section. 

A remark on stopping criteria is also desirable. The method of [Bai et al, 1996] 
computes the relative differences of (2.18) between two successive Lanczos steps 
and stops if they don't decrease below a given accuracy. In order to perform the 
test their algorithm needs to compute the eigenvalues of Tj at each Lanczos step i. 
This may be a large computational overhead. On the other hand the test proposed 
here is theoretically safe. This is clarified by the remark at the end of the proof of 
the lemma (2.14). However, this test may be too prudent since the prime interest 
here is the computation of the bilinear form (2.2). 

To illustrate this situation I give an example from a lattice calculation (The 
lattice size and parameters are given in section 5.1). I compute the bilinear form 
(2.2) for: 

(2.19) f(s) = logtanhVs, s £ M+ 

and A = H^, b £ C N . The real and imaginary parts of the b— elements are chosen 
randomly from the set {+1,-1}. 
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In Fig. 1 are shown the normalized recursive residuals po/pi = 1 1 b — Axi || 2 /||6|| 2 ,i = 
1, . . . , n and relative differences of (2.18) between two successive Lanczos steps. The 
figure illustrates clearly the different regimes of convergence for the linear system 
and the bilinear form. The relative differences of the bilinear form converge faster 
than the computed recursive residual. This example indicates that a stopping cri- 
terion based on the solution of the linear system may indeed be strong. Therefore, 
the recommended stopping criteria would be to monitor the relative differences of 
the bilinear form but less frequently than proposed by [Bai et al, 1996]. More in- 
vestigations are needed to settle this issue. Note also the roundoff effects (see Fig. 
1) in the convergence of the bilinear form. 

3. Conclusion 

In this paper I have described computational methods needed to evaluate the 
effective action of the theory with suppressed cutoff modes. 

All methods described in this paper have a complexity that does not depend 
on the input sparse matrix that describes the fermion theory on the lattice. In 
this way, it may be concluded that simulation algorithms of lattice QCD which are 
based on the estimation of the effective fermion action have the same complexity. 
The UV-suppressed fermions are such an example. 
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